Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster presentations at ISMB 2020 will be presented virtually. Authors will pre-record their poster talk (5-7 minutes) and will upload it to the virtual conference platform site along with a PDF of their poster. All registered conference participants will have access to the poster and presentation through the conference and content until October 31, 2020. There are Q&A opportunities through a chat function to allow interaction between presenters and participants.

Preliminary information on preparing your poster and poster talk are available at: https://www.iscb.org/ismb2020-general/presenterinfo#posters

Ideally authors should be available for interactive chat during the times noted below:

View Posters By Category

Poster Session A: July 13 & July 14 7:45 am - 9:15 am Eastern Daylight Time
Session B: July 15 and July 16 between 7:45 am - 9:15 am Eastern Daylight Time
July 14 between 10:40 am - 2:00 pm EDT
A combined machine learning and glycomics analysis for the identification and discrimination of plant gum species used in historical artworks
COSI: CompMS COSI
  • Amra Aksamija, Art Institute of Chicago, United States
  • Clara Granzotto, Art Institute of Chicago, United States
  • Ken Sutherland, Art Institute of Chicago, United States
  • Narasimhan Balakrishnan, Northwestern University, United States
  • Neda Bagheri, University of Washington, United States

Short Abstract: Plant gums, the exudates generated on the branches and trunks of certain trees in response to external attack, have often found use in artworks as adhesives and paint binders, such as in watercolor paints. Identifying the different species of gums used in historical artworks opens avenues into technical investigations regarding plant sources, trade routes, and material selection by artists in the past. In this work, we describe a machine learning approach to identify signatures of gums from the three genera most commonly used in cultural heritage (Acacia, Astragalus and Prunus) based on MALDI-MS spectra obtained from individual reference gum samples as well as historical artworks. Our method involves a modified t-SNE based nonlinear dimension reduction, which distinguishes among gums of the three plant genera as well as some of the different species within the same genus. This approach also facilitates gum identification in historic samples, avoiding laborious manual comparison with the built database. Beyond the realm of artwork, our analysis can be extended to other domains such as pharmaceuticals and biomedical applications, where one wishes to distinguish different (bio)chemical species using an experimental omics and mass spectrometry-based pipeline.

A comprehensive multi-omics investigation of metabolic dysregulation in blood-stage malaria
COSI: CompMS COSI
  • Cathy Shang Kuan, McGill University, Canada
  • Zhiqiang Pang, Institute of Parasitology, Mcgill University, Canada
  • Jasmine Chong, Institute of Parasitology, Mcgill University, Canada
  • Jianguo Xia, Institute of Parasitology, Mcgill University, Canada

Short Abstract: Despite continued efforts towards malaria eradication, it remains to be a significant global health burden. Drug resistance threatens to reverse progress of malaria control, illuminating the urgent need to identify novel therapeutic targets for malaria treatment. Metabolic reprogramming is an emerging mechanism by which parasites induce metabolic alterations to dampen host immune responses and facilitate their survival. However, the role of metabolism in malaria infections remains underexplored. To address this gap, we first aim to characterize molecular perturbations in humans with blood-stage malaria (Plasmodium vivax) using metabolomics data obtained from several studies (Number of P. vivax patients: 242). This is achieved using a newly developed approach for pathway-level meta-analysis for untargeted LC-MS based metabolomics data. On a subset of patients, we also showcase a novel network-based algorithm that leverages the P. vivax genome-scale metabolic model to integrate metabolite features with transcriptomics data and identify modules of metabolic dysregulation in P. vivax infections. The identification of P. vivax-induced alterations will yield mechanistic insights into malaria pathogenesis and may serve as diagnostic biomarkers. Ultimately, these changes can be exploited to identify metabolic drug-targets and improve malaria elimination and eradication.

A framework for automatically identifying low-quality peaks in untargeted LC-MS metabolomics data
COSI: CompMS COSI
  • Kelsey Chetnik, Icahn School of Medicine at Mount Sinai, United States
  • Lauren Petrick, Icahn School of Medicine at Mount Sinai, United States
  • Gaurav Pandey, Icahn School of Medicine at Mount Sinai, United States

Short Abstract: Liquid chromatography paired with high-resolution mass spectrometry (LC-MS) is commonly used for untargeted metabolomics analyses. Despite several available pre-processing software for generating feature/peak abundance data from these analyses, significant challenges persist, including large variation in peak detection across software, high prevalence of false positive detections, and poor integrations. Without automatic and objective methods to assess integration quality in LC-MS data, manual assessment, which is time-consuming and subjective, is the only way to ensure that poor peak integrations do not propagate to downstream analyses. To address this challenge, we developed a novel computational framework that automatically identifies low-quality peaks in untargeted LC-MS data using a combination of machine learning techniques and peak quality metrics. Specifically, we adapted seven metrics developed for targeted proteomics, and five for untargeted metabolomics LC-MS data. We then trained and evaluated nine classification algorithms on these two metric sets, as well as their union, on a development set to determine the algorithm and metric set whose combination yields the best-performing peak quality classifier. Validation on two independent test sets showed that this classifier performed well in distinguishing low-quality peaks from high-quality ones. Our framework, as well as this classifier, are available as the MetaClean package at github.com/GauravPandeyLab/MetaClean.

A machine learning approach for deciphering protein-protein interactions in human plasma
COSI: CompMS COSI
  • Emily Roth, University of Ottawa, Canada
  • Diane Forget, Institut de recherches cliniques de Montréal, Canada
  • Vanessa Gaspar, Institut de recherches cliniques de Montréal, Canada
  • Steffany Bennett, University of Ottawa, Canada
  • Marie-Soleil Gauthier, Institut de recherches cliniques de Montréal, Canada
  • Benoit Coulombe, Institut de recherches cliniques de Montréal, Université de Montréal, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: Immunoprecipitation coupled to mass spectrometry (IP-MS) methods are often used to identify protein-protein interactions (PPIs). While these approaches are prone to false-positive identifications through contamination and antibody non-specific binding, their results can be filtered by combining the use of negative controls and computational modelling. Such filtering does not effectively detect false-positive interactions when IP-MS is performed on human plasma samples, given a higher propensity for non-specific interactions. Therein, proteins cannot be overexpressed or inhibited, and existing modelling tools are not adapted for execution without such controls. Hence, we propose a novel machine learning-based approach for identifying PPIs in human plasma using IP-MS leveraging negative controls that include antibodies targeting proteins not known to be present in human plasma. Unsupervised machine learning algorithms are first applied to label-free MS quantification data to identify a set of high-quality controls. Our method then uses a logistic regression classifier to assess the reliability of PPIs detected in IP-MS experiments using antibodies targeting known plasma proteins. When applied to 882 putative interactions, our algorithm identified 29 PPIs with an FDR of 16.3%. This method provides an unprecedented ability to detect human plasma PPIs, enabling a better understanding of the biological processes in plasma.

A Machine Learning Paradigm for Classifying Lipids and Other Metabolites into Major Structural Categories
COSI: CompMS COSI
  • Elizabeth Mahood, Cornell Univeristy, United States

Short Abstract: The majority of metabolites detected in untargeted LC-MS metabolomic studies of complex biological samples currently receive no or unreliable structural annotation. Here, we have developed machine learning models capable of predicting the structural classes of these metabolites with high accuracy.
Through a grid-search of feature-based models, we identified random forest models that classified lipids into LipidMAPS and LipidBlast ontologies, with 95-100% accuracy for most classes using just the chemical formula. Addition of in silico MS/MS fragmentation data further improved the multi-class model accuracy for 28/30 LipidBlast classes. To test algorithmic performance in classifying compounds beyond lipids, formula and MS/MS based models were successfully developed for classifying diverse metabolites from PlantCyc and flavonoids from the ReSpect for Phytochemicals databases, respectively. In absence of a metabolome-spanning single-label ontology, we trained models for multi-label classification into the ChemOnt ontology, and achieved an overall accuracy of 85% using just the chemical formula, which may be further improved using MS/MS. Ongoing work seeks to expand the utility of this approach to predicting plant-derived compound classes using formula and MS/MS features.
Accurate structural classification of LC-MS features will reveal the chemistries of thousands of metabolites currently left unannotated and provide deeper insights into many biological processes.

A supervised learning approach to predict intraductal carcinoma in prostate tissue using mass spectrometry imaging
COSI: CompMS COSI
  • Soroush Shahryari Fard, University of Ottawa, Canada
  • Assia Ait Slimane, Centre de recherche du Centre Hospitalier Universitaire de Montréal and Institut du cancer de Montréal, Canada
  • Noémi Roy, Centre de recherche du Centre Hospitalier Universitaire de Montréal and Institut du cancer de Montréal, Canada
  • Afnan Al-Saleh, Centre de recherche du Centre Hospitalier Universitaire de Montréal and Institut du cancer de Montréal, Canada
  • Mirela Birlea, Centre de recherche du Centre Hospitalier Universitaire de Montréal and Institut du cancer de Montréal, Canada
  • Nidia Lauzon, Université de Montréal, Canada
  • Pierre Chaurand, Université de Montréal, Canada
  • Dominique Trudel, Centre de recherche du Centre Hospitalier Universitaire de Montréal and Institut du cancer de Montréal, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: Intraductal carcinoma (IDC) is a type of prostate cancer lesion that is associated with a high-risk of recurrence and a poor prognosis. However, reliable biomarkers for IDC are lacking, thereby complicating its diagnosis. Conventional proteomics approaches are poorly equipped to discover IDC-specific biomarkers, since IDC typically only constitutes a small portion of tumor volume. Mass spectrometry imaging (MSI) is an alternative approach that detects and quantify peptides in situ in tissue. We therefore aimed to detect a proteomic signature of tumors with IDC using matrix-assisted laser desorption/ionization (MALDI) MSI. To achieve this goal, we built a computational pipeline that performs a statistical and machine learning analysis on the MSI data of 35 patients (20 with IDC and 15 without) to identify peptides that discriminate IDC and non-IDC patients. Our method identified 551 significantly differentially expressed peptides in prostate tumor samples (FDR-adjusted p-value < 0.05 and fold-change > 2 or < 0.5). Fifteen of these peptides were used to train a logistic regression classifier predicting patients’ IDC status with a sensitivity and specificity of 85% and 90%, respectively. We show that our approach can identify IDC-specific biomarkers and accurately classify prostate cancer patients, which will improve prostate cancer diagnosis and treatment.

Advancing fragmentation strategies development in LC-MS metabolomics.
COSI: CompMS COSI
  • Vinny Davies, University of Glasgow, United Kingdom
  • Joe Wandy, University of Glasgow, United Kingdom
  • Justin Jj van der Hooft, Wageningen University, Netherlands
  • Stefan Weidt, University of Glasgow, United Kingdom
  • Rónán Daly, University of Glasgow, United Kingdom
  • Simon Rogers, University of Glasgow, United Kingdom

Short Abstract: The success of untargeted metabolomics analyses that make use of fragmentation spectra for molecule identification are heavily dependent on the quality and coverage of the fragment spectra acquisition strategy used. Here we demonstrate the development of a new Data Dependent Acquisition (DDA) strategy called SmartROI that overcomes some of the inefficiency in traditional Top-N DDA procedures. SmartROI targets ions that are likely to belong to chromatographic peaks by only fragmenting those that belong to regions of interest (traces of highly similar m/z values in a contiguous sequence of scans). Regions of interest (ROIs) are created in real-time using a standard algorithm and a set of simple rules determine when an ROI can be fragmented. Using a mass spectrometry acquisition simulator (VIMMS), we show how the parameters of SmartROI can be optimised in-silico, saving costly machine time. We have implemented the optimised SmartROI on a Thermo Fusion Tribrid MS where it demonstrates substantially improved fragmentation coverage over a Top-N controller on two complex mixtures.

Application of Differential Network Enrichment Analysis to decipher disease-related alterations of metabolism from high-throughput metabolomics data
COSI: CompMS COSI
  • Gayatri Iyer, Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, United States
  • Janis Wigginton, Michigan Regional Comprehensive Metabolomics Resource Core, Ann Arbor, MI, United States
  • William Duren, Michigan Regional Comprehensive Metabolomics Resource Core, Ann Arbor, MI, United States
  • Marci Brandenburg, Taubman Health Sciences Library, University of Michigan Medical School, Ann Arbor, MI, United States
  • Jennifer LaBarre, Department of Nutritional Sciences, University of Michigan School of Public Health, Ann Arbor MI, United States
  • Charles Burant, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, United States
  • George Michailidis, Department of Statistics, University of Florida, Gainsville, FL, United States
  • Alla Karnovsky, University of Michigan, United States

Short Abstract: We developed a novel bioinformatics tool for the analysis and interpretation of metabolomics data. Based on our previously published Differential Network Enrichment Analysis (DNEA) methodology, we built a Java-based, user-friendly software tool that performs joint estimation of partial correlation network topology from input data for two groups of samples (e.g. normal-disease condition), identifies subnetworks via consensus clustering and determines enrichment using the NetGSA algorithm. This tool accommodates experimental designs and sample sizes often encountered in biomedical research. In the situation where input data contain a much larger number of metabolites than samples, the DNEA tool uses a combined knowledge- and data-driven approach to aggregate highly correlated metabolites into singular features, making a comparable feature space and sample space, thereby achieving higher statistical power to estimate interactions among these features. Further, we use a subsampling-based procedure to recover highly robust network edges from datasets with extremely imbalanced experimental groups.
DNEA was tested extensively on multiple metabolomics and lipidomics datasets from different metabolic disorders. Biologically relevant enriched subnetworks, with strong associations with other phenotypes of interest were identified. These enriched subnetworks from partial correlations may provide deeper insights into disease mechanisms, as well as changes in the biochemistry underlying different physiological states.

Artificial Intelligence-Based TMT Experiment Planning Tool
COSI: CompMS COSI
  • Steven Eschrich, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Scott Cukras, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Bin Fang, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Eric Welsh, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Paul Stewart, H. Lee Moffitt Cancer Center and Research Institute, United States
  • John Koomen, H. Lee Moffitt Cancer Center and Research Institute, United States

Short Abstract: Tandem Mass Tag (TMT) proteomics is a “barcoding” approach for multiplexing samples in a single mass spectrometry proteomics experiment. Currently, 6-16 samples can be run simultaneously to detect and quantify peptides from complex sample mixtures. Our group used 29 TMT-6plex experiments in a large-scale analysis of lung squamous tumors and identified factors that impact proteomics experiments. These heuristics, or rules of thumb, include sample randomization, distributing samples with the same phenotypic characteristics across experiments and including control samples for batch correction. We have implemented a TMT experiment planner web tool for designing experiments based on these heuristics. A user can input a sample set, the number of TMT experiments and phenotypic variables. The system will identify a solution (experimental layout) that meets the constraints imposed by the heuristics. The key to this system is the open-source tool OptaPlanner, which is an artificial intelligence-based constraint solver. We encode constraints as hard constraints (cannot be violated) or soft constraints (preferences assigned numerical penalties). Once a solution is determined, a graphical experiment design is presented to the user. The user can then modify the design as needed. After planning is completed, a worksheet is generated with TMT labeling assignments and other experimental parameters.

Automated mass spectrometry signal quantification using supervised deep learning
COSI: CompMS COSI
  • Avi Swartz, Harvard University, United States
  • Ali Rahnavard, George Washington University, United States

Short Abstract: Mass spectrometry is a crucial and well-used tool for detecting the intensity of small molecules in molecular biology experiments. Small molecules can be used as biomarkers for characterizing and investigating biological activities including human health and disease status. Liquid Chromatography and Gas Chromatography step are used ahead of MS step to separate molecules using retention time, however, there are still molecules that they overlap in their mass-to-charge ratio (m/z) and usually get ignored by detecting software. One approach to curing data for overlapped molecules, in their m/z, is to manually assign start and end m/z to molecules that overlap. For this, an expert should take output curves from the mass spectrometer and locate the start and stop times of the signal of interest. We have developed and evaluated massLens, an automated pipeline, as software, that integrates the curve from the start to the stop time, which gives the quantity of the molecule that generated the signal. Determining the quantity of each molecule in the sample is the ultimate goal of the mass spectrometry experiment.

Bayesian annotations for targeted lipidomics (BATL): Accurate identification of lipid species quantified using LC-ESI-MS/MS in selective and multiple reaction monitoring modes
COSI: CompMS COSI
  • Justin G. Chitpin, University of Ottawa, Canada
  • Hongbin Xu, University of Ottawa, Canada
  • Priya Sarwal, University of Ottawa, Canada
  • Thao T. Nguyen, University of Ottawa Brain and Mind Research Institute / Ottawa Institute of Systems Biology, Canada
  • Graeme P. Taylor, University of Ottawa Brain and Mind Research Institute / Ottawa Institute of Systems Biology, Canada
  • Theodore J. Perkins, Ottawa Hospital Research Institute, Canada
  • Steffany Bennett, University of Ottawa, Canada

Short Abstract: Targeted lipidomic approaches based on liquid chromatography, electrospray ionization, tandem mass spectrometry (LC-ESI-MS/MS) excel in relative and absolute quantitation of lipid species, using selective and multiple reaction monitoring (SRM/MRM) modes to monitor transitions of parent and product ions.
SRM/MRM peaks are manually assigned identities under the assumption that an observed signal within a given retention time corresponds to a single lipid target. While a valid assumption for a single condition, peak features are fundamentally influenced by treatment, disease state, and tissue matrix. Manual curation becomes unreliable when multiple isobars, varying across conditions, are detected within the same transition and retention time window. The human curation approach is thus prone to identification error and biased towards reporting lipids of interest expected by the analyst. As a Lipidomics Standardization Initiative (LSI) platform, we developed BATL, a Gaussian naïve Bayes classifier, that models distributions of input features across biological conditions. Peak identification in new samples is accomplished by maximizing the joint probabilities of all peak's identities under that model. When tested against gold standard sphingolipid and
glycerophosphocholine SRM datasets representing mixed matrices and conditions, BATL correctly identified over 95% of all annotated peaks, making it a powerful new tool for SRM/MRM identification.

Challenges in Reproducibility of Multi-Omic Analysis
COSI: CompMS COSI
  • Ravali Adusumilli, Stanford University, United States
  • Daniel Garijo, University of Southern California, United States
  • Varun Ratnakar, University of Southern California, United States
  • Arunima Srivastava, Virginia Tech, United States
  • Yolanda Gil, University of Southern California, United States
  • Parag Mallick, Stanford University, United States

Short Abstract: Major breakthroughs in genome, transcriptome and proteome measurement have enabled the multi-omic studies required to advance our understanding of cellular regulation and dysregulation. Despite these advances, multi-omic analysis remains challenging. The analysis process typically involves dozens of interdependent steps for processing, transforming, and managing ‘omics data including file-format conversion, quality assessment, feature extraction, statistical inference, and network and pathway analysis. Furthermore, each of these steps can be performed by dozens of tools each with hundreds of parameters. These challenges not only make initial analysis difficult, they engender a series of cascading consequences to the reproducibility, archivability, robustness, extensibility and reusability of multi-omics studies.

The majority of studies into reproducibility have focused on experimental issues such as reagent variation and evolving or mistaken cell lines. Recently, the National Academies released a report on reproducibility and replicability in science. Motivated by the National Academy recommendations, we attempted figure-for-figure reproduction of CPTAC publications as a case study: 1) identify the major sources of irreproducibility in multi-omics studies; 2) generate an examplar workflow-based multi-omic study that adheres to the National Academy Guidelines; and to 3) to quantify the impact of workflow variation and data extension on the major and minor results of the study.

Comparison of Protein Database Sources for Proteome Analysis in Non-Model Organisms
COSI: CompMS COSI
  • Mark Lubberts, University of Waterloo, Canada
  • Brendan McConkey, University of Waterloo, Canada

Short Abstract: Many omics techniques, including proteomics, are becoming increasingly accessible for studying non-model organism. Despite new approaches for de novo peptide identification, database searching is still a key technique for identification of proteins from mass spectrometry data. However, most non-model organisms have poorly described proteomes, and researcher must choose alternate database sources such as proteins of related organisms, translations from transcriptomic data, or predicted proteins derived from genomic data. Here, we compare the impact of database choice on proteomic analysis of the fathead minnow (Pimephales promales), commonly used in aquatic toxicology, using databases derived from the fathead minnow genome sequence, related organisms, and short- and long-read RNA sequencing experiments. Preliminary results show that protein identifications, PSM score distributions, and quantification results are heavily impacted by database choice, and that potential database choice should impact the design of proteomics experiments in non-model organisms.

Computational optimization of differential mobility spectrometry parameters in lipidomics analyses
COSI: CompMS COSI
  • Xunxun Shi, Neural Regeneration Laboratory (University of Ottawa), Canada
  • Thao T. Nguyen, University of Ottawa Brain and Mind Research Institute / Ottawa Institute of Systems Biology, Canada
  • Graeme P. Taylor, University of Ottawa Brain and Mind Research Institute / Ottawa Institute of Systems Biology, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada
  • Steffany Bennett, University of Ottawa, Canada
  • Theodore J. Perkins, Ottawa Hospital Research Institute, Canada

Short Abstract: Glycosphingolipids such as α and β-glucosylceramides (GlcCers) and, α and β-galactosylceramides (GalCers) are stereoisomers differentially synthesized by gut bacteria and their mammalian hosts in response to environmental insult. Thus, lipidomic assessment of α and β-GlcCers and α and β-GalCers is crucial for biomarker discovery and pathomechanistic studies. However, simultaneous quantification of these stereoisomeric lipids is difficult due to their virtually identical structure. Differential mobility mass spectrometry (DMS) as an orthogonal separation in LC-ESI-MS/MS can discriminate stereo-isomeric lipids with different functions. Currently, the development of an LC-ESI-DMS-MS/MS method for lipidomics analysis demands intensive manual optimization and relies exclusively on the availability of synthetic lipid standards. Where synthetic standards do not exist, method development is not possible. We are building an algorithm using a machine learning approach to determine optimal instrument parameter values. We describe here the development of a neural network that predicts ion intensities of product ions following LC and orthogonal DMS separation of pairs of lipid stereoisomers at various DMS parameter settings. This enables the user to replace manual, empirical parameter optimization with much more efficient in silico model-based optimization and provides a means to predict DMS parameters when synthetic lipid standards are not available.

Cross-species comparison of metabolic features associated with aggregating tau protein
COSI: CompMS COSI
  • Vrinda Kalia, Columbia University, United States
  • Megan Niedzwiecki, Icahn School of Medicine at Mount Sinai, United States
  • Joshua Bradner, Columbia University, United States
  • Douglas Walker, Icahn School of Medicine at Mount Sinai, United States
  • Dean Jones, Emory University, United States
  • William Hu, Emory University, United States
  • Gary Miller, Columbia University, United States

Short Abstract: The formation of hyperphosphorylated tau(p-tau) protein tangles in neurons is a pathological marker of Alzeimer’s disease(AD). AD has also been associated with altered metabolism in animal and epidemiological studies. To discern the association between p-tau protein and metabolism, we performed high-resolution metabolomic analysis on cerebrospinal fluid(CSF) obtained from 26 AD patients and 25 controls. Using the BR5270 Caenorhabditis elegans strain, which expresses aggregating tau protein in all neurons, we studied the effect of aggregating tau protein on metabolism using high-resolution metabolomic analysis in the whole worm. In the population study, we found 255 features associated (p<0.05) with p-tau levels in human CSF. These features enriched the fatty acid and amino acid metabolism pathways. Worms expressing aggregating tau showed 900 features altered. These features suggested alterations in glycerophospholipid, fatty and amino acid metabolism pathways. To determine which metabolic features are altered in both species, we analyzed annotated features for overlap. We also created a metabolite-protein interaction network using these features on omicsnet.ca. The walktrap algorithm identified 8 modules enriched for glycerophospholipid, arachidonic acid, and linoleate metabolism. Thus, in this preliminary analysis we find that several pathways associated with AD in population studies are also altered in worms expressing aggregating tau protein.

Democratizing DIA analysis on public cloud infrastructures via Galaxy
COSI: CompMS COSI
  • Matthias Fahrner, Institute for Surgical Pathology, Faculty of Medicine, University of Freiburg, Germany
  • Melanie Christine Föll, Institute for Surgical Pathology, Faculty of Medicine, University of Freiburg, Germany
  • Björn Andreas Grüning, Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany
  • Oliver Schilling, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany

Short Abstract: Data independent acquisition (DIA) has become one of the most important approaches in global proteomic studies. DIA data provides detailed and in-depth insights into the molecular variety of biological systems. However, due to the high complexity and large data size the data analysis remains challenging. Available open-source software requires different operational systems, programming skills, and large compute infrastructures. Thus, current open-source DIA data analysis is mainly applicable by bioinformatics competent researchers with access to large computational resources and often lacks reproducibility and usability. Here we present a straight-forward workflow containing all essential DIA analysis steps based on OpenSwath, pyprophet, diapysef and swath2stats, which can be applied and adapted by a large user community without the need for tool installations, special computing resources and programming skills. The all-in-one DIA workflow in Galaxy drastically increases the robustness, reproducibility and speed of the DIA data analysis due to parallel processing of multiple inputs using Galaxys HPC- and cloud infrastructure. Each tool is available as Conda package and Biocontainer. However, a few steps in this workflow require up to 1 TB of memory, hence we recommend to use the workflow on the European Galaxy server (usegalaxy.eu) which can utilize worldwide HPC- and Cloud resources.

FDR-controlled detection of peptides in narrow-window DIA data for spectral library generation
COSI: CompMS COSI
  • Lilian R. Heil, University of Washington, United States
  • William E. Fondrie, University of Washington, United States
  • Uri Keich, University of Sydney, Australia
  • Michael J. MacCoss, University of Washington, United States
  • William S. Noble, University of Washington, United States

Short Abstract: Data independent acquisition (DIA) has emerged as a powerful method to systematically fragment and measure all peptides in a sample. Quantitative analysis of DIA data can be performed using targeted signal extraction with a spectral library to detect and quantify peptides with high sensitivity and accuracy. The success of such analyses requires a high quality library containing accurate information about which peptides are detectable in a sample matrix, their retention times, and relative intensities of fragment ions. Searle et al. found that generating a sample-specific library with gas phase fractionation to collect narrow-window DIA can improve detections in quantitative DIA runs. Here, we describe a strategy to detect peptides in this narrow-window DIA data with standard database search tools. The approach uses a modified target-decoy competition protocol to calculate peptide-level FDR for any number of peptides per spectrum, as opposed to limiting results to one peptide per spectrum. A key component of our strategy is the use of Gaussian smoothing of peptide-spectrum match scores in the time domain, which we demonstrate yields improved statistical power to detect peptides. The overall approach allows us to build a high-quality spectral library from narrow-window DIA data.

Improved Identification of Modified Peptides using Localization-aware Open Search
COSI: CompMS COSI
  • Fengchao Yu, University of Michigan, United States
  • Guo Ci Teo, University of Michigan, United States
  • Andy Kong, University of Michigan, United States
  • Sarah Haynes, University of Michigan, United States
  • Dmitry Avtonomov, University of Michigan, United States
  • Daniel Geiszler, University of Michigan, United States
  • Alexey Nesvizhskii, University of Michigan, United States

Short Abstract: Discovering post-translational modifications (PTMs) is a crucial yet challenging task in liquid chromatography mass spectrometry-based proteomics. Various tools have been developed in open search framework. By virtue of its highly efficient indexing algorithm, MSFragger is one of the fastest tools. However, MSFragger historically did not index fragment ions modified by unknown mass shifts. Including only unmodified ions in the index precludes matching any fragment peaks that contain unexpected modifications. Here we present its extension, termed localization-aware open search, in which both modified and unmodified ions are effectively indexed and used in scoring. To reduce random matches, we also developed a mass calibration method that allowed us to reduce the mass tolerances and other related parameters without sacrificing sensitivity. With localization-aware, we observe a significant increase in sensitivity and precision, which enables us to identify and localize various modifications from a complex data set. Using a phosphorylation-enriched data set and a simulated data set, we demonstrate that the addition of the shifted ion index increases sensitivity by about 19% and 65%, respectively, enabling MSFragger to identify the most peptide-spectrum matches among its competitors. Finally, we compare run times among multiple tools and find that the speed of MSFragger is still unmatched.

Integrated Feature Extraction Enhances the Metabolome Coverage in Data-Dependent LC-MS/MS-Based Metabolomics
COSI: CompMS COSI
  • Yaxi Hu, University of British Columbia, Canada
  • Tao Huan, University of British Columbia, Canada

Short Abstract: In untargeted metabolomics, conventional data preprocessing software (e.g., XCMS, MZmine 2, MS-DIAL) are used extensively due to their high efficiency in metabolic feature extraction. However, these programs present limitations in recognizing low-abundance metabolic features, thus hindering complete metabolome coverage from the analysis. To address this bioinformatic challenge, we explored the possibility of enhancing the metabolome coverage of data-dependent liquid chromatography−tandem mass spectrometry (LC−MS/MS) results by rescuing metabolic features that are missed by conventional software. We assessed the false positives and quantitative accuracy of the metabolic features that contain MS/MS spectra but are not recognized by conventional software. Our results indicate that these missed features contain valid and important metabolic information and should be integrated into the conventional metabolomics results. Thus, we developed a data-preprocessing pipeline to extract low-abundance metabolic features and integrate them with the results from conventional programs. This strategy was tested on a set of metabolomic data. Our results show that this integrated feature extraction strategy remarkably improves the metabolome coverage beyond that of conventional data preprocessing, therefore facilitating the confirmation of metabolites of interest and accomplishment of a higher success rate in de novo metabolite identification.

IPC 2.0 - Isoelectric Point Calculator 2.0 - prediction of isoelectric point and pKa dissociation constants using deep learning
COSI: CompMS COSI
  • Lukasz Kozlowski, Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Poland

Short Abstract: In proteins and peptides, the isoelectric point (pI), the pH at which a particular molecule is electrically neutral due to the equilibrium of positive and negative charges, depends on dissociation constant (pKa) of charged groups of seven amino acids and NH+, COO- groups at polypeptide termini. The information about pI and pKa’s is extensively used in 2D gel electrophoresis (2D-PAGE), capillary isoelectric focusing (cIEF), X-ray crystallography, and mass spectrometry (MS). Therefore, there is a strong need for in silico prediction of pI and pKa’s values. Here, I present Isoelectric Point Calculator 2.0 (IPC 2.0), a web server for the prediction of isoelectric point and pKa values using deep learning (DL) approach. The tool predicts pI for peptides and proteins using two separate DL models trained on different datasets. Additionally, for proteins (sequences > 50aa) for each charged residue pKa value is predicted (separate DL models for individual pKa of the amino acid). The models use sequence and the features derived from it (the informative features had been selected from AAindex). The prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms and is 0.69762 and 0.10659, respectively. IPC 2.0 is freely available (public domain) at ipc2.mimuw.edu.pl

Isolation forests improve the capability to detect quality problems in mass spectrometry-based proteomics
COSI: CompMS COSI
  • Akshay Kulkarni, Khoury College of Computer Science, Northeastern University, United States
  • Eralp Dogu, College of Science, Mugla Sitki Kocman University, Turkey
  • Roger Olivella, Proteomics Unit, Centre de Regulaci ́o Gen ́omica, Spain
  • Eduard Sabido, Proteomics Unit, Centre de Regulaci ́o Gen ́omica, Spain
  • Olga Vitek, Northeastern University, United States

Short Abstract: Quality control (QC) of mass spectrometry based proteomic experiments involve quantifying a standard mixture which includes a set of analytes, generating multiple metrics. The metrics are then used to evaluate effects of technical variability to the quantification of the actual biological samples. Next, a reliable baseline data set is used to train statistical models or traditional statistical quality control methods to classify outlying runs. Although current technological improvements help conduct initial steps of most QC workflows, many practitioners still lack baseline data and misclassifies QC runs in real time implementation.

Here, we present an unsupervised machine learning extension of MSstatsQC to detect deviations from optimal performance of multiple metrics. MSstatsQC implements unsupervised isolation-based trees to address outlier detection problem. Our results show how tree-based methods are helpful in terms of differentiating optimal and sub-optimal experiments where limited information is available about the optimal/suboptimal performance of the instrument. We also provide supporting information based on the root causes of anomalous behavior per peptide which can be used to design preventive actions. Our method is available with MSstatsQC R/Bioconductor package and with web-based graphical user interface MSstatsQCgui.

Mass spectrometry imaging in the age of reproducible medical science
COSI: CompMS COSI
  • Melanie Christine Föll, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany
  • Lennart Moritz, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany
  • Thomas Wollmann, Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Germany
  • Maren Nicole Stillger, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany
  • Niklas Vockert, Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Germany
  • Martin Werner, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany
  • Peter Bronsert, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany
  • Karl Rohr, Biomedical Computer Vision Group, BioQuant, IPMB, Heidelberg University, Germany
  • Björn Andreas Grüning, University of Freiburg, Germany
  • Oliver Schilling, Institute of Surgical Pathology, Medical Center – University of Freiburg, Germany

Short Abstract: Mass spectrometry imaging (MSI) has great potential for a variety of clinical research areas including pharmacology, diagnostics and personalized medicine. MSI data analysis remains challenging due to the large and complex data generated by the measurement of hundreds of analytes in thousands of tissue locations. Reproducibility of published research is limited due to extensive use of proprietary software and in-house scripts. Existing open-source software that paves the way for reproducible data analysis necessitates steep learning curves for scientists without programming knowledge. Therefore, we have integrated 18 MSI tools into the Galaxy framework (usegalaxy.eu) to allow easy accessible data analysis with high levels of reproducibility and transparency. The tools are based on Cardinal, MALDiquant and scikit-image enabling all major MSI analysis steps from quality control to image co-registration, preprocessing and statistical analysis. We successfully applied the MSI tools in combination with other proteomics and metabolomics Galaxy tools to analyze a publicly available N-linked glycan imaging dataset, as well as in-house peptide imaging cancer datasets. Furthermore, we created hands-on training material for use cases in proteomics and metabolomics and provide a Docker container for a fully functional analysis platform in a closed network situation, such as in clinical settings.

Matching peptides to data independent acquisition mass spectrometry data
COSI: CompMS COSI
  • Yang Lu, University of Washington, United States
  • Wenruo Bai, University of Washington, United States
  • Jeffrey A. Bilmes, University of Washington, United States
  • William S. Noble, University of Washington, United States

Short Abstract: Perhaps the primary challenge in analyzing DIA data is to tackle the interference that occurs when the observed signal associated with one peptide overlaps with signals from another peptide. DIASearch explicitly models this interference by constructing a bipartite graph that jointly matches all DIA spectra and a precursor database. Each edge connects a spectrum and a precursor, and its weight reflects a variety of features, including fragment matching, precursor intensity, the difference between observed and predicted retention time, and consistency of matches across charge states and across time.

Combining these features is nontrivial because each feature exhibits biases specific to retention time, m/z, charge, peptide length or combinations thereof. DIASearch therefore performs feature-wise calibration before aggregating the scores. DIASearch then selects confident precursors in a greedy fashion according to this score.

DIASearch is applied to a variety of datasets that differ in sample complexity, instrument type, gradient length, and isolation window size. Empirical results show that DIASearch outperforms existing methods such as PECAN, Prosit, and DIA-Umpire. In particular, DIASearch is capable of detecting peptides with weak or undetectable MS1 signal from DIA data, which will not be detected by DIA-Umpire.

MealTime-MS: A machine learning-guided real-time mass spectrometry analysis for protein identification and efficient dynamic exclusion
COSI: CompMS COSI
  • Yun-En Chung, University of Ottawa, Canada
  • Alexander R. Pelletier, University of Ottawa, Canada
  • Zhibin Ning, University of Ottawa, Canada
  • Nora Wong, University of Ottawa, Canada
  • Daniel Figeys, University of Ottawa, Canada
  • Mathieu Lavallée-Adam, University of Ottawa, Canada

Short Abstract: While mass spectrometry-based proteomics can identify thousands of proteins in a biological sample, commonly used mass spectrometry data acquisition approaches suffer from a poor identification sensitivity of low abundance proteins. In a typical protein identification experiment, mass spectra are preferentially collected from proteins with higher abundance. The identification of these proteins is then performed after the completion of the experiment. Such an approach typically results in the redundant acquisition of mass spectra from proteins with high abundance, while very few are collected for low abundance proteins, which therefore remain unidentified. Hence, we propose a novel supervised learning-based algorithm (MealTime-MS) that identifies proteins in real-time as mass spectrometry data are acquired and prevents redundant data acquisition from already confidently identified proteins. Using in-silico simulations of a mass spectrometry analysis of a HEK293 cell lysate, we demonstrate that MealTime-MS successfully identifies 92.1% of the proteins normally detected in the experiment without any data exclusion, while using only 66.2% of the mass spectra. We also show that our approach outperforms a previously proposed method, and is sufficiently fast for real-time mass spectrometry analysis. Finally, MealTime-MS’ efficient usage of mass spectrometry resources will provide the tools for a more comprehensive characterization of proteomes.

metabCombiner: Paired Untargeted LC-HRMS Metabolomics Feature-Matching and Concatenation of Disparately Acquired Datasets
COSI: CompMS COSI
  • Hani Habra, University of Michigan, United States
  • Alla Karnovsky, University of Michigan, United States
  • Charles Evans, University of Michigan, United States

Short Abstract: A key step in the analysis of untargeted LC-MS metabolomics data is the alignment of features (characterized by mass-to-charge-ratio (m/z) and retention time (rt)), detected in individual samples. Most existing software focus on aligning features detected in samples analyzed under roughly identical conditions. As experimental protocols vary across institutions and could be modified within the same lab, an alternative approach is needed to achieve a correspondence between identical compounds in disparate assays.

We present metabCombiner, an R package that implements a workflow for matching known and unknown features in a pair of untargeted LC-MS metabolomics datasets. The output is a file containing an overlap of two feature lists with concatenated measurements, providing increased power to statistical analyses and orthogonal information for curation of metabolite identities. The key steps in this pipeline are: 1) separate preprocessing and filtering of input feature lists; 2) grouping features from both datasets by m/z values; 3) retention time spline fitting through selected ordered pairs; 4) pairwise similarity scoring of paired features using differences in m/z, retention time (projected vs observed), and relative abundance. We demonstrate metabCombiner on metabolomics data acquired from different sample types (plasma, urine, muscle), analyzed with varied experimental parameters by different institutions.

MetaboAnalystR 3.0: Towards Optimized Workflow for Global Metabolomics
COSI: CompMS COSI
  • Zhiqiang Pang, Institute of Parasitology, Mcgill University, Canada
  • Jasmine Chong, Institute of Parasitology, Mcgill University, Canada
  • Shuzhao Li, The Jackson Laboratory for Genomic Medicine, United States
  • Jianguo Xia, Institute of Parasitology, Mcgill University, Canada

Short Abstract: Liquid chromatography coupled to high-resolution mass spectrometry platforms are increasingly employed to comprehensively measure metabolome changes in systems biology and complex diseases. Over the past decade, several powerful computational pipelines have been developed to for spectral processing, annotation and analysis. However, significant obstacles remain with regards to parameter settings, computational efficiencies, batch effects, and functional interpretations. Here we introduce MetaboAnalystR 3.0, a significantly improved pipeline with three key new features: 1) efficient parameter optimization for peak picking; 2) automated batch effect correction; and 3) more accurate pathway activity prediction. Our benchmark studies showed that this workflow was 20~100X faster compared to other well-established workflows and produced more biologically meaningful results. In summary, MetaboAnalystR 3.0 offers an efficient pipeline to support high-throughput global metabolomics in the open-source R environment.

Metabolomic data analysis in drug discovery: revealing elicitation of secondary metabolites in Antarctic bacteria.
COSI: CompMS COSI
  • Kattia Núñez-Montero, Universidad de La Frontera, Chile
  • Leticia Barrientos, Universidad de La Frontera, Chile

Short Abstract: Concern about finding new drugs is increasing every year, particularly against drug-resistant pathogens. Antarctic bacteria have been proposed as an unexplored source of bioactive metabolites; however, most biosynthetic gene clusters (BGCs) producing secondary metabolites remain silent under common culture conditions. Our work aimed to characterize elicitation conditions for the production of antibacterial secondary metabolites from 34 Antarctic bacterial strains based on MS/MS metabolomics and genome mining approaches. Bacterial strains were cultivated under different nutrient and elicitation conditions, including the addition of lipopolysaccharide (LPS), sodium nitroprusside (SNP), and coculture. Metabolomes were obtained by HPLC-QTOF-MS/MS and analyzed through molecular networking. Antibacterial activity was determined, and seven strains were selected for genome sequencing and analysis. Biosynthesis pathways were activated by all the elicitation treatments, which varies among strains and dependents of culture media. Increased antibacterial activity was observed for a few strains and addition of LPS was related with inhibition of Gram-negative pathogens. Our data suggest that the tested conditions allowed the expression of interesting natural products, including putative actinomycin, carotenoids, and bacillibactin analogs. This work established the use of promising new elicitors for bioprospection of Antarctic bacteria and highlights the importance of new “-omics” comparative approaches for drug discovery.

Mokapot: Fast and Flexible Semi-Supervised Learning for Peptide Detection
COSI: CompMS COSI
  • William E. Fondrie, University of Washington, United States
  • William S. Noble, University of Washington, United States

Short Abstract: Machine learning methods have played a pivotal role in boosting the sensitivity of peptide detection from database search results in proteomics experiments. One such method, Percolator, implements a semi-supervised learning algorithm that learns a support vector machine (SVM) classifier by iteratively accumulating confident peptide-spectrum matches (PSMs). Since its introduction, Percolator has proven to be popular and highly effective for many proteomics experiments. However, analyses conducted with Percolator remain fairly inflexible, requiring an internal three-fold cross validation scheme and restricted to the use of a linear SVM classifier. Additionally, multiple Percolator analyses are required to obtain separate peptide- and protein-level confidence estimates from distinct sets of PSMs, thereby discouraging complex experimental designs. Here we present Mokapot, a fast and flexible Python implementation of the Percolator algorithm, which improves upon Percolator in three ways: (1) Mokapot supports a wide variety of machine learning models, including any classifier available in scikit-learn; (2) Mokapot allows for specification of experimental groups, enabling complex experimental designs; and (3) Mokapot offers an alternative confidence estimation procedure specifically for cross-linked PSMs from cross-linking mass spectrometry experiments. Together, these improvements enable Mokapot to provide the benefits of the Percolator algorithm to a wider array of proteomics experiments.

New mixture models for decoy-free false discovery rate estimation in mass-spectrometry proteomics
COSI: CompMS COSI
  • Yisu Peng, Northeastern University, United States
  • Yong Li, Illumina Inc., United States
  • Michal Gregus, Northeastern University, United States
  • Alexander Ivanov, Northeastern University, United States
  • Olga Vitek, Northeastern University, United States
  • Shantanu Jain, Northeastern University, United States
  • Predrag Radivojac, Northeastern University, United States

Short Abstract: Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs), have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to inaccuracies and problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation efficiency and low signal-to-noise spectra.

Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics using Probabilistic Modeling
COSI: CompMS COSI
  • Ramtin Hosseini, Tufts University, United States
  • Neda Hassanpour, Tufts University, United States
  • Li-Ping Liu, Tufts University, United States
  • Soha Hassoun, Tufts University, United States

Short Abstract: Despite computational advances, interpreting untargeted measurements and determining their biological roles remains a challenge. We present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of “ground truth” metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. When applied to case studies, PUMA annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.

Reproducibility of Mass Spectrometry based Metabolomics Data
COSI: CompMS COSI
  • Tusharkanti Ghosh, University of Colorado, United States
  • Katerina Kechris, University of Colorado, United States
  • Debashis Ghosh, University of Colorado, United States

Short Abstract: Pdf uploaded.

Simple targeted assays for metabolic pathways and signaling: a powerful tool for targeted proteomics
COSI: CompMS COSI
  • Andreas Hentschel, Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Germany
  • Dominik Kopczynski, Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Germany
  • Robert Ahrends, Department of Analytical Chemistry, University of Vienna, Austria

Short Abstract: Protein quantification and validation is currently being performed with isotope dilution in combination with targeted mass spectrometry approaches like selective or parallel reaction monitoring (SRM / PRM). However, creating pathway specific assays remains time-consuming even after years of improvements. For sophisticated analyses, high-quality and high-resolution reference data are mandatory to create targeted assays. Several protein databases were published that addressed and partially improved those aspects. We introduce the ‘Simple Targeted Assays for Metabolic Pathways and Signaling’. STAMPS is a pathway centric web-service designed to develop targeted proteomics assays. Several intuitive interfaces provided in STAMPS guide the user towards a rapid and simplified method design. Using our curated framework to signaling and metabolic pathways, we achieved a 148-fold reduction of the average development time of an assay in comparison to the second fastest state-of-the-art tool. Its core function is an interactive pathway interface (manually curated by domain experts) enabling the user to browse through a graph-based visualization of pathways, to (de)select proteins, to retrieve additional protein and metabolite information and to download the provided MS/MS spectra with a ‘few click’ solution. STAMPS is available as a web tool, free of charge for academic purposes and can be accessed at stamps.isas.de.

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships
COSI: CompMS COSI
  • Florian Huber, Netherlands eScience Center, Amsterdam, the Netherlands, Netherlands
  • Lars Ridder, Netherlands eScience Center, Amsterdam, the Netherlands, Netherlands
  • Justin Jj van der Hooft, Wageningen University, Netherlands
  • Simon Rogers, University of Glasgow, United Kingdom

Short Abstract: Highlights:

Spectral similarity is key for many metabolomics analyses.

Here, we introduce Spec2Vec, a novel spectral similarity score that is more scalable and more proportional to structural similarity of molecules than traditional scores.

The advantages of Spec2Vec are shown in library searching for both exact matches and analogues as well as in molecular networking.